Geometry Selects Highly Designable Structures 
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By enumerating all sequences of length 20, we study the designability of structures in a two- 
dimensional Hydrophobic-Polar (HP) lattice model in a wide range of inter-monomer interaction 
parameters. We find that although the histogram of designability depends on interaction parameters, 
the set of highly designable structures is invariant. So in the HP lattice model the High Designability 
should be a purely geometrical feature. Our results suggest two geometrical properties for highly 
designable structures, they have maximum number of contacts and unique neighborhood vector 
representation. Also we show that contribution of perfectly stable sequences in designability of 
structures plays a major role to make them highly designable. 



Proteins are bio-macromolecules, which consist of lin- 
ear sequences of monomers, the 20 naturally occurring 
amino acids. Each protein folds to a unique spatial struc- 
ture as its iiative state, which is its global minimum of the 
free energyH. This structure specifies the functionality of 
the protein sequence in the nature. 

It has been noted that certain structures are^more com- 
monly observed among proteins than othersoEI. There 
are efforts to explain this phenomenon by consider- 
ing protein structure designability, defined as the num- 
ber of seouences that would successfully fold in one 
structurcQ. By definition, highly designable structures 
(HDSs) have more chance to be found as native, and 
also they have good stability against sequence muta- 
tions. A recent numerical analysis on experimental data 
revealed that the distribution of observed protein fam- 
ilies over different folds can be modeled with a highly 
stretched exponentialQ. This observation is quite con- 
sistent with designability explanation. Highly designable 
structures pjiight also represent attractive targets for pro- 
tein desigrJj. 

In study of designability of native structures like many 
other features in the field of protein folding, the complex- 
ity of the problem forces to use more simplified models. 
Coarse grained view point to proteins introduce effec- 
tive inter-monomcr interactions as a relevant factor in 
designability of structures. Analyzing the interaction be- 
tween the 20 amino acids suggests that amino asides can 
be separated to Hydrophobic (H) and Polar (P) groupsQ. 
This introduces a very simple but highly popular two 
monomer types model, HP model. Although the chains 
in this simple model are too far from realistic protein 
and surly could not explain many features of real pro- 
teins, but this model had good success in clarifying the 
concept of designabilityn. The simplicity of this model 
allows one to study the ground state properties of model 
proteins by enumerating chains and configurations for 
short length chains. 

Enumerations on two and three dimensional lat- 
tice models have shown the existence of a few highly 
designable structures among many lowly designable 
onesETtS. Some studies on dynamical properties of those 



model chains, which fold in the highly designable lattice 
structures, show that they are more protein-|like. Such 
sequences are thermodynamically more stabler and, ithey 
fold to native state faster than random sequenceai2l't3. 

There have been many efforts to find the factors deter- 
mining the high designability of a structure using sim- 
ple models.|— .Qae element is the set of inter-monomer 
interactionsE3~t£l. Some evidences suggest that the set 
of highly designable structures depends on the num- 
ber of monomer types in the modelEEI. However in lat- 
tice models with short chains, it is too difficult to talk 
about helixes and sheets, it is claimed that HDSs pos- 
sess sewndary structure in two and three dimensional HP 
modelga. By the use of a clever algebraic approach in the 
framework of a simple solvation model it is shown that 
HDSs should be rare and atypical in structure spacell3. 
A recent argument shows, that this result remains valid 
for more general modelsES. 

In this paper we study the designability of lattice struc- 
tures in a wide range of interaction parameters between 
H and pE. monomers by using of a recently developed 
methodli3. Results confirm that designability of struc- 
tures depends on inter-monomer interactions but inter- 
estingly we find that the set of HDSs is invariant. There- 
fore in this simple model the geometry should have the 
essential role in the selection of some structures as most 
HDSs. Also the correlation with some other geometrical 
properties will be shown. 

We use of a_two dimensional Hydrophobic-Polar (HP) 
lattice ModelcEl for sequences with length 20. It is obvi- 
ous that two dimensional structures have significant dif- 
ferences with real three dimensional ones, but to get con- 
siderable results in three dimension it needs to go to the 
too longer chains which is not computationally accessible. 
Short chains in three dimension do not possess natural 
ratio of core sites. The method of this paper is based on 
contact matrix. Thus it can be easily generalized to any 
pair contact model. 

In Pair Contact Models the energy of a given sequence 
CT in a given structure can be written as 



^ = H^'J' 



(1) 



Where Cy and mr,-a are respectively the elements of the 
contact matrix (C) and the interaction matrix (M). c^ 
is 1 if the monomers i and j are non-sequential neigh- 
bor and is otherwise. Ui is iih. component of sequence 
vector a. In our HP model, it is equal to (—1) if ith 
monomer in sequence is a polar (hydrophobic) residue. 
The TOct-o- is the interaction energy between monomer 
type Ui and Uj. 

In two-dimensional square lattice there are 41, 889, 578 
distinct structures (non-related by rotation or reflection 
symmetries) for a sequence with length 20. There is a 
contact matrix corresponding to each structure. It is 
possible that some structures have the same contact ma- 
trix. Such contact matrices which point to more than one 
structure are called degenerate contact matrices. The 
number of non-degenerate distinct contact matrices are 
about 1 million which is much less than the number of 
all possible structures. The maximum number of possi- 
ble contacts for the sequences with length 20 is 12, and 
the number of maximally compact structures i.e. with 
maximum contacts, are 503. 

In HP model the monomers are divided to two Hy- 
drophobic (H) and Polar (P) groups. The interaction 
matrix M is thus a 2 x 2 matrix and its elements are 
Ehh and Ehp and Epp. By choosing an arbitrary en- 
ergy scale, we can parameterize these three elements of 
the interactioip_matrix in terms if two positive parame- 
ters, 7 and E^^. 



Ehh = -2-j-E„ 
Ehp = — 1 — Ec, 
Epp — —Ec, 



(2) 



Substituting these elements of interaction matrix in equa- 
tion dl^) gives following simple expression for configura- 
tion energy. 



E = —TO — Ec ■ b — ^ ■ a 



(3) 



where to., b and a are three positive integers, related to 
a and C as follows: 

TO = -cr* -C- 1, 



1 t ^ 
a = —cr ■ C ■ (T, 

2 
b^-1* -C-l. 



(4) 



Using the set of possible ground states of sequences, en- 
ables us to find the ground state of apy sequence for any 
given values of energy parameterstScil. In this way the 
designability of all structures in a wide range of energy 
parameters Ec and 7 is obtained by enumerating the all 
2^° sequences. Fig 1 shows the average designability of 
structures in a 10 by 10 square region with the mesh 0.1 
in the space of Ec and 7. In this average the structures 
with zero designability are excluded. 19,132 structures 
(about 0.05% of all structures) have non-zero designabil- 
ity at least for some given values of Ec and 7. As one can 



see in this figure, the average designability shows two dif- 
ferent regimes in space of energy parameters. In a wide 
area of energy parameters it has a value in order of 10, 
but for large Ec and small 7 it rapidly jumps to several 
hundreds. This can be explained by the fact that for 
Ec 3> 7 the contribution of b in the configuration energy 
(eq. ^ is more essential. Therefore, all native struc- 
tures are between the most compact ones. We can call 
this area of interaction parameters, compact regime. In 
this regime reduction in the number of native structures, 
increases the average designability. Alternatively when 
7 is greater than Ec or comparable with it, some non- 
compact configuration can compete with compact ones. 
We call this area swollen regim,e. 

Fig. 2 shows the histogram of designability for two 
pairs of values of Ec and 7. Fig 2. a is for the pair Ec = i, 
7 = 8 which is a point in the swollen regime and fig 
2.b is for the pair _Ec = 9, 7 = 0.5 which is a point in 
the compact regime . In the swollen regime where na- 
tive structures are not restricted to only highly compact 
structures, there are many lowly designable structures 
and a few HDSs. Alternatively in the compact Regime 
there are many intermediately designable structures and 
again a few HDSs. Similar results are reported when one 
compares histograms of designability for a fixed given 
set of energy parameters using tho, search space of all 
structures and compact structuresc3. Therefore, the in- 
teraction parameters just choose the search space and 
inside each regime (compact or swollen), the statistics of 
designability does not change qualitatively. 

The relevant question is, if the set of HDSs depend 
on the interaction parameters. To study this, we sort the 
structures by their designabilities at any set of energy pa- 
rameters. In this manner, the place of any structure in 
competition for designability is recognized with its rank. 
The rank one is the most HDS for those given energy pa- 
rameters. Averaging the rank of one structure in all space 
of energy parameters gives a good perspective about its 
global attitude toward high designability. 

Figure 3 shows the histogram of this average rank for 
all structures in the studied square region of energy pa- 
rameters. The interesting point in this diagram is that 
there are a few structures with very small average rank. 
For example there are 8 structure with average rank less 
than 10 and lowest average rank value is equal to 1.48. 
It must be noted that since the rank of a structure is a 
positive quantity, the smallness of its average shows that 
the structure has small rank in all space of energy pa- 
rameters. Thus structures with very low rank always are 
inside the hit-list of HDSs. We compared the behavior 
of average rank of structures with two asymptotic lim- 
its. If the rank of structures were invariant against the 
change of energy parameters, the histogram would show 
a completely flat behavior (solid line in figure 3). In the 
other side, if the structures rank were changed uncorrela- 
tively by the changing the inter- monomer interactions, it 
would result a very sharp Gaussian distribution as a con- 
sequence of central limit theorem (dashed line in figure 



3). As one can see in figure 3 it seems that the average 
rank behaves more similar to the case of quenched rank, 
than the random case. Especially, for the low ranks the 
histogram lies on the solid line very well. This shows 
that the ranks of HDSs are more rigid than the ranks 
of lowly designable structures. So, the set of HDSs does 
not depend on the interaction parameters, although the 
designability of structures does. The fact that the role 
of the interaction parameters is not important for choos- 
ing a structure as HDS, demonstrates the importance of 
geometry. This leads to the question which purely geo- 
metrical properties or symmetries select some structures 
as HDS. Furthermore it justifies the restriction of the 
search for HDS to highly compact structures. 

Recently we have shown in HP lattice model that 
there are some sequences which have only one non- 
degeneratCr^ossible ground state in all space of energy 
parameteralj. Because the native structures of these se- 
quences have perfect stability against the changing of 
inter-monomer interaction, we call them perfectly stable 
sequences (PSSs). PSSs give constant contribution to 
the designability of structures. For any given interaction 
parameters, each set of sequences which have a common 
ground state structure contains an invariant subset, con- 
stituted by PSSs. The designability of any structure has 
a constant part equal to the number of the members of 
its invariant subset of sequences. In our model, about 
7% of the sequences of length 20 are PSS. These PSSs 
select 489 structures as their absolute native state out 
of the 503 most compact structures i.e. only 489 struc- 
tures have a non-zero constant part of designability. (It is 
not possible that PSSs select non-compact structures as 
ground state because with a large enough Ec the compact 
structures will gain lower energies.) 

There is a strong correlation between the designabil- 
ity and its constant part. In figure 4 we have plotted 
the constant part of designability against the designabil- 
ity, for two sets of interaction parameters used in figure 
2. This figure shows that designability is nearly propor- 
tional to its constant part, but the slope is a function of 
interaction parameters. 

Indeed, in the swollen regime the slope is of order unity 
(fig 4. a). This means that the invariant subsets of se- 
quences dominate the designability of structure. But 
in the compact regime, where the constant part of des- 
ignability has a small contribution to designability, these 
quantities remain nearly proportional yet, and the most 
HDSs have the bigger constant parts (fig 4.b). This cor- 
relation is considerable because by definition the source 
of designability is different from its constant part. The 
former shows stability against monomer mutations in the 
sequence, and the later shows ability of structure to be 
an interaction independent native state. The strong cor- 
relation between these quantities suggests that they may 
have the same geometrical origin. It has been reported 
that, designability has a good correlation with Energy 
gapB. So it can be claimed that the existence of a large 
constant part in designability of a structure is a sign of 



large average energy gap. In fact, when a PSS exist, 
changing interaction parameters does not change the na- 
tive structure of sequence. So the exited states of the 
sequence should be far enough and separated by a large 
energy gap. 

As mentioned above the compactness is a necessary 
condition of HDSs because all interaction parameters are 
negative. We have found an additional geometrical sym- 
metry for HDSs which is related to the solvation na- 
ture of proteins. In a HP solvation model for proteins 
a H monomer decrease the energy proportional to the 
number of non-sequential contacts in the structure and 
the position of P monomers does not change the energy. 
In a simpler version of the solvation model all non-core 
monomers behave the sar ne .. (i. t— takes both corner and 
edge monomers as surface. )tZH13'Ej. In contrast the solva- 
tion model is a pair contact model. Our pair contact 
model in the special case 7 = becomes a_aplvation 
model. (We may call it alternatively additive^.) This 
special case has some theoxetical advantages because the 
energy has a simpler forrrollj. In this case to calculate 
the configuration energy (eq. pi) a is an irrelevant pa- 
rameter, and one can re-write m and b (equation H) as 
follows. 



m 



V, 



b--,i^-v, 



(5) 



where V is neighborhood vector (NV). Its ith component 
shows the number of non sequential neighbors of the ith 
monomer and is related to the contact matrix. 



v, = c-i^J2' 



(6) 



In this case the information needed to calculate the con- 
figuration energy can be coded in a vector instead of a 
matrix. Obviously this is a source of an additional de- 
generacy in energy spectrum. This degeneracy is equal 
to the number of spatial structures which have the same 
NV and we label it by Nd- In our enumeration for native 
configurations of length 20 the biggest Nd for NVs is 4. 
We found that all HDSs are between those structures 
with unique NVs {Nd = 1). Thus this additional geomet- 
ric symmetry is another common property of HDSs. 

Table 1. The average designability of structures 



Nd 


Number of contacts 


9 


10 


11 


12 


1 


0.85 


2.07 


14.17 


692.78 


2 




1.09 


4.97 


155.71 


3 




1.87 


2.65 


101.17 


4 






2.69 


40.04 



Table 1 shows the average designability of structures 
with specific compactness and degeneracy of NVs. It can 
be inferred that the average designability of structures 



increase with the growth of compactness and with the 
decreasing of Nd- Thus those structures which have the 
highest compactness and unique NVs are good candi- 
dates for being highly designable. These are only neces- 
sary conditions for high designability and as some lowly 
designable structures also fulfill these criteria. Jfl fact 
the structures should be also atypical to be HDStZl. Re- 
cently it was shown that these atypical structures possess 
a helices which are another character of real proteinsE3. 
Also the uniqueness of NV representation of structures 
can give an explanation fp; the ratio of surface to core 
monomers in real proteinsES. 

In summary we have studied the designability of struc- 
tures in a two-dimensional HP pair contact lattice model 
in a wide range of inter-monomer interaction parameters 
by considering HP constraints. We find the designability 
of all structures by enumerating all sequences of length 
20. Our results confirm that changing the inter-monomer 
interactions affects the structure designability and also 
chooses the search space of native state but the set of 
HDSs is invariant. Therefore geometry should have the 
essential role in the selection of some structures as most 
HDSs. 

In some regions of inter-monomer interaction param- 
eter space, the constant contribution of PSSs to des- 



ignability of structures is dominant in selection of HDSs. 
Even in those regions where the designability is much 
larger than constant part, there is still a strong correla- 
tion between designability and its constant part. Thus 
those structures, which are attractive for PSSs, are in 
the set of HDSs. This suggests a close relation between 
average energy gap of a structure and its constant part 
of designability. 

We find two geometrical necessary conditions for a 
structure to be HD. The first one is compactness. This is 
because the PSSs select only highly compact structures 
as absolute native states. Also we find that all HDSs 
have a non-degenerate NV representation. In average the 
designability of structures decreases by increasing the de- 
generacy of neighborhood vector and decreasing of com- 
pactness. This result shows that two monomer type pair 
contact models have the common result for designabil- 
ity of HDSs with the solvation model which is consistent 
with the recent studyllfl. The relevant question remains 
what will happen in models with more than two monomer 
types. 
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Figure Captions 



Figure 1. 

The Average Designability of all native structures against interaction parameters Ec and 7 in a 10 by 10 

square region with mesh 0.1. The compact and swollen regime correspond to areas of interaction space 

with large and small average designability respectively. 

Figure 2. 

Histogram of Designability for two pairs of interaction parameters, a) Ec ~ 3 and 7 = 8 (swollen regime), 

h) Ec — 9 and 7 = 0.5 (compact regime). 

Figure 3. 

The histogram of the number of structure with average ranks inside the intervals with width 50 is 

compared with the uncorrelated (dash line) and fixed (solid line) cases. 

Figure 4. 

The Designability of structures vs. the number of PSSs which choose the structure as unique ground 

state (the constant part of designability) for two pairs of interaction parameters, a,) Ec — 3 and 7 = 8 

(swollen regime), h) Ec ^ 9 and 7 = 0.5 (compact regime). 



This figure "figl.jpg" is available in "jpg" format from: 
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